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Abstract 

We introduce the concept of multidimensional antithetic as the absolute minimum of the 
covariance function defined on the orthogonal group hy A Gov (/(^), /(j45)) where ^ is a 
standard A'^-dimensional normal random variable and / : R'^ — > R is an almost everywhere 
differentiable function. The antithetic matrix is designed to optimise the calculation of E[/(^)] 
in a Monte Carlo simulation. We present an iterative annealing algorithm that dynamically 
incorporates the estimation of the antithetic matrix within the Monte Carlo calculation. 

Keywords: Antithetic variates, Monte Carlo method, Robbins-Monro algorithms, simulated an- 
nealing. 



1 Introduction 

The valuation of financial derivatives is based on the resolution of a parabolic partial differential 
equation defined by the chosen dynamics for the underlying assets subject to boundary conditions 
defined by the product (see [MJ03| ). These equations are rarely solvable explicitly and a numerical 
method has to be chosen. The standard methods of choice in the industry are resolution on grids 
and Monte Carlo. The applicability of the Monte Carlo method is a consequence of the Feynman 
Kac theorem which solves a parabolic PDE in terms of an expectation. In fact, in many of the 
more complex equity products with a large dimensionality, the grid method is not efficient and 
Monte Carlo is de facto the only pricing method. In this approach the price can be written as an 
expectation 

m = E[/(0], (1.1) 

where ^ ~ A/"(0,Id7v) is an iV-dimensional standard normal random variable describing a random 
path of the underlying assets and / : ^ R is a measurable function representing the payoff of 
the derivative contract. The Monte Carlo method is essentially a transcription of the strong law of 
large numbers which claims that m can be approximated by 

1 " 

-E/(^')' (1-2) 
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where ^i, . . . , ^„ are independent simulations of the random variable ^. 

The main drawback of Monte Carlo methods is that they are usually computationally demand- 
ing, often putting great strains on the capability of a trading operation to properly monitor its risks 
and manage complex positions. Therefore, it is of great value to design methods to improve the 
performance of the Monte Carlo calculation. It is the aim of this article to present such a method. 

In order to speed up the Monte Carlo method we can seek to decrease the simulation error. A 
measure of this error is given by a j \fn for a large enough number of simulations n f [HH64] ). where 
— v^Var [/(^)] is the standard deviation of / (^). Like the expectation m, a is usually unknown 
and needs to be estimated. Variance reduction techniques are methods that reduce this error 
by replacing / (^) by a different random variable which has the same expectation but a smaller 
variance. Hopefully, this will ensure a faster convergence of the Monte Carlo method. Antithetic 
variates is one such method. 

Antithetic variates appear for the first time in the seminal work of Hammersley, Morton, and 
Mauldon [HM561 IHM56aj . The crucial idea of this procedure is to recycle the simulations of ^ as 
samples of — ^, which has the same distribution as ^. Therefore, we can approximate m in by 

The trivial equality 

Var [/ (0 + / = 2 Var [/(O] + 2 Cov (/(O, /(-C)) 

shows that if the dependence between /(^) and /(— ^) is such that Cov (/(C), /{—£,)) < 0: then the 
accuracy of (|1.3p will be greater than that of the crude Monte Carlo (|1.2p . 

Antithetic variates were developed in the case ^ is one dimensional. For higher dimensional 
normal variables, practitioners have often used antithetics by changing the sign of some components 
of the random normal vector in a more or less haphazard manner. In fact, there is a larger underlying 
group of symmetries since, for any orthogonal matrix A G 0{N), A^ is again a standard normal 
vector. We can therefore extend the approach of Hammersley et al. to higher dimensions by 
replacing the role of — ^ above by A^ and approximating m as 

]_j2lM±I^. (1.4) 

An antithetic method corresponds to the choice of A G 0{N) and the optimal antithetic is the 
matrix that minimizes the covariancc function 

A ^ Cov (/(C), /(AO). 

Note that, 0{N) being a compact group, this function will always have an absolute minimum. The 
purpose of our work is to propose an algorithm to locate this optimal antithetic matrix A* € 0{N), 
provided the covariance is negative for some value A £ 0{N). The present paper is the first step of 
a program whose aim is to describe optimal antithetics to price some popular complex derivatives 
such as baskets, cliquets, Himalaya options and the like. 
Finally, note that solving 

min Cov(/(e),/(AC)) 

AgO(JV) 

is equivalent to solving the simpler problem 



(1.5) 
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and that both are well posed. Indeed, since 0{N) is compact, there exists & A* G 0{N) such that 
min^go(^) E [/ (^) / {A^)] = E [/ (^) / provided that / is continuous. More restrictively, we 

will assume throughout this paper that / is everywhere continuous and continuously differentiable 
except on a set of zero (Lebesgue) measure. 

1.1 Contents 

The paper is structured as follows: 

In Section [2] we recall some results about simulated annealing algorithms. These algorithms are 
designed to find the global minima of a given function defined on R*" by stochastically perturbing 
a gradient based algorithm which, by itself, would normally only converge to a local minimum. 

Section [3] contains a detailed discussion on the exponential covering map from the Lie algebra 
so(A^) to 0{N). The coordinates induced by the exponctial map seem to be the best suited for the 
optimization problem at hand. 

In Section |4l we introduce a novel iterative simulated annealing algorithm adapted to the state 
space 0{N). This algorithm provides a sequence {^fcjfceN C 0{N) of random variables converging 
in probability to the global minimum A* of (|1.5p . The efhciency and performance of the algorithm 
are checked in Section O where we use it to find the optimal antithetic for an Asian call option and 
a covariance swap. 

Finally, in Section[Sl we define a dynamical antithetic technique where we use {Ak}ki£ri C 0{N) 
to estimate (jl.ip . More concretely, we define the sequence 

fe=i 

and we prove that (II. 6p converges almost surely to m = E[/(^)], and that y/n{Sn ~ rn) converges 
in law to a normal variable M (O, of) with variance 

^* = ^(Var[/(e)]+Cov(/(0,/(A*0)). 

That is, the estimated error of the Monte Carlo method (|1.6p was reduced as much as it was possible 
using antithetics. 

1.2 Notation 

Throughout this article, {Q,J-',P) will denote the underlying probability space, where is a cr- 
algebra and P : ^ [0, 1] is a probability measure. The space of all R^-valued random variables 
will be denoted by L'^n (O, P). A standard Gaussian vector ^ : — > R^ will be a random variable 
with a probability density function given by 

with respect to the Lebesgue measure A. We will write ^ ~ A/'(0,Id7v) to mean ^ is a standard 
Gaussian vector. On the other hand, C^^ (R^) will denote the set of real functions / : R^ R 
which arc continuously differentiable except on a set of zero Lebesgue measure. 
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2 Simulated annealing algorithms 

In this section we review some of the existing methods to deal with the problem of locating absolute 
minima for a function of the form 

17 : R"^ — > R 

X ^ E[{/(a;,0], 

where ^ ~ A/" (0, Id^v) is a Gaussian random vector and U : K'^ X ^ K is a measurable function 
continuously differentiable with respect to the first r entries. Actually, in our original problem, U 
is defined on the compact Lie group 0{N) rather than W; this additional feature will be dealt with 
later. 

The absolute minima of the function U above will be zeroes of its gradient field VJ7 which can, 
under regularity conditions, be calculated as E [S/ xU {x , £,)] . Hopefully, one may expect to solve 
VJ7 (x) = by some numerical scheme such as the gradient method. However, the equation 

VUix) = E[V:rC/(a;,0] = 

is defined through an expectation. This means that, in general, the gradient VJ7 needs to be 
evaluated by a Monte Carlo simulation, which can be too expensive. Recall that we want to solve 
(jl.Sp in order to improve the efficiency of a (crude) Monte Carlo. Therefore, calculating the gradient 
by Monte Carlo estimation in order to improve our original (crude) Monte Carlo makes no sense. 
The solution is worse than the problem. In order to overcome this difficulty, we try to work directly 
with VxUix,^). 



2.1 Robbins-Monro algorithms 

Let F (•, ^) : W KJ" be a measurable vector field depending on a Gaussian vector ^ G A/'(0, Mat). 
Robbins-Monro algorithms ( (MR51| ) are designed to solve an equation of the form E [F{x, ^)] = 
by means of a scheme algorithm such as 

a;„ = a;„„i + 7„F (a;„_i,^„) , (2.1) 

where {'^n}neN is a non-negative sequence of real numbers and {^nlnsN are independent Gaussian 
vectors. Observe that, in our particular case, F = —VxU. We will set F{x) — E [F{x,£^)]. 

Theorem 1 ((D9ll Theorem 1.4.26]) A ssume that F{x) has a zero x* . Then the sequence de- 
fined by \2.1\) converges almost surely to x* for almost all initial conditions xq provided that 

Al. (^x - X* ,F {x)) < for any X eW . 
A2. E„gn7« = oo and 'Enen^n < oo. 

i<" (a;„_i, ^„)||^ I J-'n-i < ^ (^1 + llCnll^^ '^•S- for some constant K > where we set 
Tn-\ to he the a-algebra generated by the random variables {$,k \ k < n — 1}. 

Here (•, •) and \\ ■ \\ denote the Euclidean scalar product and norm respectively. 

Remark 2 An example of a sequence {7n}neN verifying the conditions of A2 is — ^ for some 
constant c > 0. 



A3. E 
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Among the hypotheses stated in Theorem [1] Al is the most problematic. It means, on the 
one hand, that x* is a unique zero of the vector field F and, on the other, that — is a 
Lyapunov function for F so that x* is an asymptotically stable equilibrium point. When F = —VU, 
this condition is sometimes hidden behind the requirement that U (x) is convex and has a unique 
minimum. Unfortunately, both conditions are too restrictive. In fact, by Morse theory (see |M63j ). 
the number of critical points of a function defined on 0{N) such as our original covariance function 
has to be at least the dimension of the total rational homology space of 0{N), which is 2L^/^J+-'^ 
(see |MT91I Corollary III.3.15]). 

There are variants of the Robbins-Monro algorithms that are guaranteed to converge to critical 
points of F which are actual minima of U. However, there is no certainty that these will be absolute 
minima. In the next subsection, we will discuss simulated annealing methods which will converge 
to global minima. 

Robbins-Monro algorithms combined with importance sampling methods have also been suc- 
cessfully used in the context of derivative pricing to reduce the variance of Monte Carlo simulations 
(see [A03) . |DV98j . and [FS02] ). For example, if the price of a derivative is E[/(^)] with 

^ ^ A/'(0, Id^v) as in (|l.ip . then a simple change of variables shows that 

E[/(0]-E[/(C-Hx)e---«- 

However, the variance of the variable under the second expectation operator now depends on x. 
So we will achieve faster convergence by choosing x that minimises Var[/(^ -|- a;)e~^'^~ 2 ] or, 
equivalently, 

C7(x) =E[/2(0e-"-«+5ll^ll']. (2.2) 

If turns out this function is convex and thus has a unique minimum. The Robbins-Monro algo- 
rithm (suitable modified with a truncation method to ensure condition A3) can then be applied to 
F{x,^) = Va;[/2(0e~"='^+^ll''ll'] to yield the optimal value of x e M'^ f [A04) . [A03] ). Note that in 
our problem the function to be minimised is not a measure change, and that it does not have the 
growth properties at infinity as a function of a; of 



2.2 Simulated annealing algorithms 

As we have mentioned, our minimisation problem will give rise to multiple critical points and thus 
is not suited to the Robbins-Monro scheme. A method that has been devised to deal with multiple 
minima is Simulated Annealing. This is a technique inspired in the metallurgy where a metal alloy 
is heated to pull it out of a equilibrium state, a local minimum of the energy, and then slowly cooled 
to allow atoms to diffuse into the lowest energy state where the system has some optimal physical 
property. The analogue of this heat injection in the Robbins-Monro algorithm (j2.1|) is the addition 
of an extra source of randomness that hits the approximating sequence out of local minima. In 
their most general form, simulated annealing algorithms are written as 

Xn+l ^ Xn-1 + JnF {Xn~l,Cn) + bnCn, (2.3) 

where {&n}neN C M is a real sequence, the annealing temperature scheme, and {CnjnsN a second 
sequence of i.i.d Gaussian random vectors independent from the {^„}„gN- In order to study their 
convergence, first of all, we need to introduce some notation. 

Let Bx,y denote the set of continuous paths ip : [0, 1] M'" starting at x e R*" and ending at 
y e W. Let U{x,^) be as above and set F{x,^) = -VxU{x,^) and U{x) = E[U{x,^)]. Recah that 
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our intention is to locate absolute minima of U. Let S_ be the set where the absolute minimum is 
achieved 

S ^ {x \U (x) ^ min U (y)} 

yeR'' 

and 5 = {a; e R'' | VJ7 {x) = 0} the set of critical points of U which we will assume has finitely 
many connected components. For any I > 0, let B {S_, I) be the set of points which are at a distance 
less than / from S. Define 



I{x,y) — inf 2 max U {(p{t)) — U{x) and d* = max mml{x,y) . (2-4) 
vE-Bx.H YO<t<i J xes\s_ves_ 

Observe that d* is a measure of how oscillatory the objective function U is. 

The next theorem provides sufficient conditions for guaranteeing the convergence of simulated 
annealing algorithms. It is extracted from ,FGQ97j Theorem 5.2]. 

Theorem 3 Let {r„}„gN C M and {ft.n}ngN C K 6e the sequences of real numbers defined by 
r-a = l/n'^, < 7 < 1, and hn = d/((l - 7) Inn), where d > d* . For {^njneN, {CnjneN two 
independent sequences of i.i.d Gaussian random vectors define an iteration scheme by 



Xn = Xn-1 - r'„Va;C/(x„_i,<^„) + \/r„/l„Cn- (2.5) 

Assume the function F{x) = — E[VxC/ (a;, ^)] satisfies the following properties: 
Bl. limsup||,,|^^ < A/i < oo, 

Cf(x) x) 

B2. limsup||^j[^oc ii^ii^i < -c < 0, and 

B3. limsup||,||^^ ^ < Ah < oo 

for some positive constants Mi, M2, and c. Then, for any I > 0, 

P {{xn ^ B {S, I)}) ^ I as n ^ 00 (2.6) 
uniformly for any initial condition in an arbitrary compact set. 
Remark 4 

1. If the absolute minimum x* of U is unique, then p.6p implies that the sequence {xnjneN 
converge in probability to x* . 

2. In fact, Fang et al. have proved a more general version of Theorem [3] for a wider spectrum 
of sequences {r„}„g[, and {/i„}ngN (see |FGQ97| for details). For example, they prove that 
Theorem [3] holds when r„ = b/n and /i„ = d/ In(lnn), & > 0, d > 0, and n € N. Simulated an- 
nealing algorithms with coefficients —b/n and ^bdj (n In(lnn)) have already been considered 
in |GM91|[GM93) in the particular case V^U {xn^i, ln) = ^xV (xn-i) +Cn with y : r -> R 
a deterministic function. 
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3. Hypotheses Bl, B2, and B3 are weaker than the corresponding hypotheses Al, A2, and 
A3 in the Robbins-Monro algorithm. This is because the former only impose restrictions 
at infinity. For example, if we know that the absolute minimum of the function U lies in 
a bounded domain, we can modify U far away from that domain so that Bl, B2, and B3 
automatically hold and {x„}„gN in (|2.5p converges. If we are certain that the minimum is 
within the ball B {x* , R) of radius R centered at some x* E W, then we can add a penalty 
function appropriately on a narrow spherical shell around ||a;|| — R such that —VxU strongly 
points towards B {x* , R) . 

4. As far as the rate of convergence of (|2.5p is concerned, it is also proved in |FGQ97| that there 
exists a constant S > such that, for any e > 0, 

P {{xn e B {S, I)}) > 1 - e if n > exp (~ ^j-^—^ \n{2e)^ . 
2.3 Continuous simulated annealing 

Simulated annealing is often presented in the literature in its continuous version. The stochasti- 
cally perturbed iterative algorithm is then replaced by a stochastic differential equation with drift 
—WxU the negative of the gradient of the function U E (W) we wish to minimise, and diffusion 
coefficient decaying to zero with time as in the annealing method. Unfortunately, as it will become 
clearer later on, this procedure will only work for deterministic functions; that is, functions which do 
not depend on a random variable as is indeed our case. However, we wish to review time-continuous 
simulated annealing for the benefit of a more complete exposition. Most of the results quoted here 
are extracted from BHT08 where Baudoin, Hairer, and Teichmann study the Ornstein-Uhlenbeck 
process on a compact Lie group and its properties to design efficient simulated annealing schemes. 
This recent paper improves considerably the efficiency of simulated annealing techniques developed 
so far (see |HKS89I ). The reader is also encouraged to check with [B08[ Chapter 5] for a more 
comprehensive approach. 

Let G be a compact Lie group and let £ = i 12t=i o be a second order differential operator 
acting on (G, /ic); where is the Haar measure and, for any i — 1, d, Vi £ q is & left invariant 
vector field. We assume that Hormander's hypoelliptic condition holds, i.e., that the Lie sub-algebra 
generated by {Vi, Vrf} coincides with g. In this context, the carre du champ operator F is 
defined as 

r {g, f)^C ifg) - fC (g) - gC (/) , f,gE L' (G, mg) ■ 

Let U E (G) be a differcntiablc function and let e E (0, 1] be the annealing temperature 
parameter. Assume that the following integral is finite 

Z, := / e-^^a^/'"dfiG{g) <oo. 

JG 

Then we can define the Gibbs measure fi^ by He {B) := -g- jge^^^^^/'^^ d^cig), B E B{G). 
Intuitively this measure concentrates on the minima of U as the temperature e falls to zero. The 
Gibbs measure is invariant under the differential operator = e^£ — -^^{U,-), which is the 
infinitesimal generator of the stochastic differential equation 



d 

dXt - Vo {XI) dt + sY^V, {XD 6BI 

i=l 



(2.7) 
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where Vq — —^T (U, •) S X(G) represents the negative gradient vector of the function U. This is 
the continuous version of the simulated anneahng process. Note that if the temperature e is set 
to zero, we are left with an ordinary differential equation where the carre du champ T{U, •) plays 
the role of the ordinary gradient. Its flow, hopefully, will evolve to the closest minimum of U. 
The addition of appropriately selected annealing schedule function e(t) will ensure we move to an 
absolute minimum. 

In |BHT08 the following remarkable result is proved: 

Theorem 5 Assume the annealing heat function is given by 



v/ln {R + 1) 

for positive constants c, i? > 0. Then, under mild conditions on U G (G), 

P {{Xf e Bs}) < My^A^,(,) {Bs), (2.8) 

where AI > 0, Bg := {g £ G \ U{g) > Uq + , and Uq is the absolute minimum ofU. In particular, 
provided that there exists only one element go G G such that U{go) = Uq, 12. implies that 

limE[/(Xf)] = /(.9o) 

t — *oo 

for any continuous bounded test function f £ C (G) . 

In other words, in order to find the global minima of U we can simulate numerically the stochas- 
tic differential equation (|2.7[) and approximate its invariant measure /ig which, as time goes by, 
concentrates around g^ G G. 

Unfortunately, our original problem does not fit directly in this framework. The drift vector 
field Vb — \T (U, •) in (|2.7p is assumed to be deterministic, it cannot depend on an independent 
Gaussian noise, as happens in our case: according to p.Sp . U = f {£,) f {A£^), where ^ ~ A/'(0, Mat), 
A G 0{N), and / e Gj^(R^). Removing this stochastic dependence would imply working with 
U{A) = E[/ (^) / (A^)]. As we already argued, it is computationally inconvenient to take the 
expectation at this point. 



3 Lie group methods 

We have so far reviewed the simulated annealing algorithms available to globally minimize functions 
defined on W through an expectation. However, our optimization problem (|1.5[) does not take place 
on an Euclidean space but on a Lie group, namely, the orthogonal group 0{N). Therefore, we need 
to adapt the results of Section [2] to 0{N) by taking suitable coordinates. As it is customary on 
Lie groups, we will take local coordinates by means of a local diffeomorphism from the Lie algebra, 
(fi : (N) 0{N), which is a Euclidean space. In this section we are going consider two different 
choices of ip: the Cayley transform and the exponential map. These are two of the most used 
coordinate patches in the design of numerical integrators for ordinary differential equations on Lie 
groups ( |IMPZ05| ). This section aims at giving explicit expressions for the gradient of a smooth 
function F : 0{N) — > R when composed with the local coordinates ip. Such a gradient will be 
employed later in our simulated annealing algorithm. 

Let G be a Lie group and g its Lie algebra. The tangent bundle tq ■ TG ^ G is trivial meaning 
that it is isomorphic to the product G x g. The identification TG = G x q can be carried out by 
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means of an isomorphism p : TG G x g induced by right translations. If Rg : G ^ G denotes 
the map defined for g £ G hy Rg{h) = kg, then p{v) = {g,TgRg-i{v)), where g = tq {v). We refer 
to this trivialisation as the space coordinates on the tangent bundle. 

Given a smooth function F : G M, the tangent map TF : TG TR it induces will be noted 
TF^^ when written in space coordinates. That is, 

TF^'^ : G X g — > 

{g,X) ^ {F{g),TgFoT,Rg{X)). 

Equivalently, if (ys : g ^ G is a local diffeomorphism from a neighborhood of G g, Tip'"^ : g x 
g — > G X g stands for the tangent map Tip : g x g — > TG in space coordinates, i.e., Tx'p'^^ = 
Tip(x)Ii(p(x)-'^ ° Tx'P, X G 2- It can be immediately checked that 

T{Foip)^ TF"'oTip'"=. 

In order to minimise a function on G or, via a coordinate chart, on g, it will be useful to have a 
notion of gradient field. Assume therefore that we are given an arbitrary metric (•, •) on g. We will 
describe the gradient of a function F : G — > R in space coordinates. For X, F G g, the gradient of 
F o ip : Q ^ R satisfies 

(V (Foip) {X),Y) ^d{Foip) (X) {Y)^pr2oTxiFoip) (F) 
^pr2oT^,^x)F''oTxv''{Y) 

where pr2 : R x R — > R denotes the projection onto the second factor. Therefore, if {F;},-! ^(iim(fl) 
is an orthonormal basis, then 

dim(g) 

V(Fo(^)(X)= ^ {pr2oT^^x)F'-oTx^''{Y,))Y,. (3.1) 

i=l 

Example 6 (Canonical coordinates of the first kind) Let 1^9 : g ^ G be 1^9 {X) := exp {X) g 

for some 5 G G. The exponential map of a Lie group is a local diffeomorphism from a neighborhood 
of G g onto a neighborhood of 5 G G. Using these coordinates, it can be checked that 

Tx^- = ^"P^"'/^'^' = E T^W ° (3.2) 

adx ]^o^^^ 

(see |IMPZ05llE98| ). Canonical coordinates of the first kind are convenient for nilpotent Lie algebras 
because then the series p. 21) becomes a finite sum. Unfortunately, so (N) is not nilpotent. However, 
these coordinates are useful for 50(3) because, in this particular case, the exponential can be easily 
computed by means of the Rodrigues formula. Indeed, if 



one can prove that 




X ^ \ a -c G so(3) 



^ , sin(cr) 1 - cos((t) 
exp {X) = Id+ — ^—^X + 5-^^ , 



where a — vt?+^? + (?, and 



1 - COs(cr) ^ , cr - sin((T) ^3 



Tx exp = Id + ^—^X + T^X 

a G'^ 

For n > 3 several methods have been devised to calculate the exponential map by other means than 
truncation of the series X]n>oo 'h.-^^i "^hich often leads to numerical error (see [BLP05| . |GX02j ) . 
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4 Higher dimensional antithetic variates 

We now present an annealing type method to find the optimal antithetic matrix A* e 0{N) as 
defined in the introduction. Recall that an optimal antithetic is an absolute minimum of the function 
A 1-^ Gov (/ (^) , / (A^)), where / G Cj^(R^) is the payout function depending on a path described 
by a normal vector ^ ~ A/" (0, Idjv)- This is equivalent to minimising U{A) = E [/ (^) / (A^)]. In the 
notation of previous sections we write U{A, £,) = f / (A^). 

The plan is to coordinate SO{N) by means of the exponential map exp(y), Y E so{N), intro- 
duce an iterative algorithm, and study its convergence in the light of Theorem [S] Unfortunately, 
C/(exp(y),^) have not the appropriate behaviour at infinity as specified by hypothesis B2 in Theo- 
rem|31 Indeed, let {X,Y) :— ^tra,ce{XY^) be the standard Euclidean product on so (A^) and let 
its associated norm, Y,Xe so{N). If F = — VE[/ {() f (exp(-)^)] denotes the gradient of [/ o exp 
with respect to an orthonormal basis of so{N) then, since U is bounded, 

(F(Y),Y) 
lim ^ ^ = 0, 

\\y\H^ \\Yf 

which is not strictly negative as B2 requires. In order that our algorithm satisfies this and the rest 
of hypotheses, we will therefore modify L'^(exp(K)) far away from a bounded set with a penalty 
function. 

Before stating the main results of this section, we need an auxiliary lemma. Recall that the 
rank of a Lie group is the dimension of a maximal torus and that SO{N) has rank [N/ 2\, where 
[•J stands for the integer part of a real number. We define R :— lN/2\ for the sake of a simpler 
notation. 



Lemma 7 The compact ball B{Q,tt\ R) C so{N) surjects onto SO{N) by the exponential map. 

Proof. It is a well know fact that any matrix in SO{N) is conjugated to an element in the maximal 
torus of matrices with lN/2\ diagonal blocks of the form 



/ cos 9 — sin 9 
\ sin 9 cos 9 



e [-7r,7r], 



as exp I ^ " 1 . Since conjugation commutes with exponentiation, we conclude that, if 



see |BD851 Chapter IV Theorem 1.6]). It is immediate to see that these rotations can be written 

-9 


K := jdiag {Si, Sii) G 5o{N) \ = (^^" "^"^ j , 9, e [-tt, tt], i = 1, i?| , 

then the compact set | A £ SO{n), k g C so{N) surjects onto SO{n) by the exponen- 

tial map. The lemma follows because, for any A S SO{n) and any k € K, we have 

WAkA'^W^ = i trace (Afc/fc^A^) = ^trace(fcA:^) < Rn^ 

■ 

As stated above, we will use a penalty function to ensure that our algorithm converges. We use 
the notation 1b{Y) for the characteristic function of a set B C so (A^). 
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Definition 8 Let P{Y) be the function defined on so{N) by 

^(n = nP-iB(o,.v^r(n 

The algorithm we present in the following theorem requires some notation. Let Eij e so(A^) be 
the matrix with +1 in the (i,j)th position, —1 in the (j, i)th position and zero elsewhere. The set 
{^ij}a<i<j<N ^ basis of so{N), orthogonal with respect to the aforementioned standard scalar 
product. 

Theorem 9 Let and {Cn}„gN be sequences of independent standard normal N -dimensional 

vectors, Aq € 0{N) and let ip be the exponential function centered at Aq, ip{Y) = exp(y)Ao. Set 
Yq — and Zq = 0. Then the sequence {A„}„gN defined by 

= {f{in)Vf^\A^^,uTY„-i'P""{E,,)An-iU) € 5o(iV) (4.1a) 



r„ - y^-^ -^{^- + . (y„_o) + ^j^^^^C^ (4.ib) 

A„ = exp(y„)Ao (4.1c) 

converges in probability to the optimal antithetic A* ( or the antithetic locus if there are multiple 
minima) in the connected component of SO{N) containing Aq. Here d > is the same as in 
Theorem 

Proof. The proof consists in showing that the iterative process {yn}rieN is the simulated annealing 
algorithm described in Theorem [3] applied to the function J7(^,exp(F)) + P{Y), F S so (N). 

Since C/(^,exp(F)) + P{Y) coincides with J7(^,exp(y)) on i3(0,7r-\/R) and is strictly greater 
outside this ball, the minima of these functions will coincide in B{0,7ry/R)- Thanks to Lemma [7l 
the exponential of the absolute minima of U{(,,exp{Y)) + P{Y) yield the optimal antithetics. 

The Robbins-Monro gradient term in the annealing procedure is 

Fi^,,,Y^^i) - -Vy (C/(e„,exp(y)) + P{Y)) |y=y„_,. (4.2) 

In order to calculate it, let f^Y) — exp(y)Ao and write U^{A) :— U{£,,A), A S 0{n). The tangent 
map TaU^ ■ TaO{N) Ty^(^)R ~ R can be described in space coordinates on X S so(iV) by using 
the chain rule in the following manner 

T^C/f (X) = TaU^ [TmRa (X)) = TuiU^ o Ra) (X) = 

- /(0/(exp(Xt)AO = fiOyPU^XA^, 

where V/''' is the row vector [df/d£,^, . . . ,df/d£,^). Now, let {-£'j:j}o<i<j<Af ^'^ ^^^'^ orthonormal 
basis of so{N) above-mentioned. By Section [3] Equation p.ip . the gradient VyiU^ ° f) in the first 
part of (|4.2p is built from the composition of T^(y)U^'^ with Ty^p'^^ ■ Explicitly, 

Vy[U^ {f{0'^rUTYip'%E,j)AO E,, e so(7V), A = exp(y)Ao. 

i<j 

On the other hand, the second part of (j4.2[) is simply minus the gradient of the penalty function 
P{Y), —2Y ■ Ig^Q T^^y{Y)- Thus, the Robbins-Monro part in the simulated annealing algorithm 
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IS 



i<j 




2Y ■ 1 



S(0,TryS)<= 



evaluated at y = ^,1-17 ^ = ^n-i, and ^ = This yields the iterative scheme described in the 
statement. 

To prove convergence we have to show that hypothesis Bl, B2 and B3 in Theorem |3] hold. Since 
U{^, A) is bounded the only part of the objective function that will contribute in the limits in those 
conditions will be the penalty function P{Y). That is, we can replace F{Y) by — V||y|p — —2Y. 
Bl, B2 and B3 follow easily in this case. ■ 

Remark 10 

1. The choice of initial matrix, Aq, is rather arbitrary unless one has additional information on 
the behaviour of the function /. 

However a couple of observations can be made to aid an informed decision. The first is that 
0{N) has two connected components, SO{N) and SO{N)~ , defined as the preimages of +1 
and —1 by the continuous map det : 0{N) — s- {+1,-1}. The algorithm we will define will 
stay in the connected component that contains Aq. 

The maximum of the covariance function Gov (/ (^) , / (A^)) is achieved at A = Id e SO{N), 
where it values Var[/ (^)]. Assuming the covariance function is continuous, it will be positive 
around the identity matrix Id. Therefore, it seems advisable to choose the initial matrix in the 
other connected component SO{N)~ . It might be of interest to explore whether the minimum 
of the covariance function always achieved in SO{N)~ . 

2. The convergence of algorithm (14. 1|) depends on the constants d > 0. The parameter d controls 
the heat injection in the annealing scheme. As mentioned in Theorem [3l d must be larger 
than d* in Equation (|2.4p . which measures the oscillatory nature of the function. In our 
case, given that the objective function U is essentially the covariance, it can be seen that 
d* < 4 Var[/ (^)]. Since Var[/ (^)] is a number that any user of Monte Carlo will be familiar 
with, we can estimate the magnitude of a valid choice of d. 

3. As stated the algorithm applies to functions / which are differentiable. Many of the functions 
used in pricing derivatives are non differentiable along a hypersurface. In fact the algorithm 
will work provided the set of non-differentiable points has measure zero a condition that will 
hold in most real life scenarios. 

5 Examples 

In this section, we are going to test the performance of our algorithm (j4.ip in finding the optimal 
antithetic for a couple of elementary derivatives, an (arithmetic) Asian option and a covariance 
swap. Both products will be priced under the Black-Scholes model. This means that, under the 
risk-neutral probability, the price of an asset St is expressed as 



5, = 5oe('-'^-'''/2)*+''^S 



(5.1) 



where r stands for the continuous interest rate, d for the continuous dividend yield, and v for the 
volatility, all assumed constant; Bt denotes a standard Brownian motion, i.e., Bt — Bg ~ AA(0, t—s), 
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t > s. On the other hand, if ft{Si, Sk) is the payoff function at time f of a given contract 
depending on k assets, the price of such contract at time t is given by 

e-'-*E[/t(5i,...,5fe)]. (5.2) 

We do not intend to find the right price of such products in a reahstic market, where more 
sophisticated models may be required, but simply implement algorithm (|4.ip and show its efhciency 
and usefulness in standard Monte Carlo simulations. We hope to explore antithetics under other 
asset models in a future work. 

5.1 Asian option 

An (arithmetic) Asian call option with strike price is a derivative contract based on an asset 
price St whose payoff at the expiry time T is given by 



max 



V i=l / 



where < ti < ... < tN = T is a, sequence of times set in the contract at which St is observed. 
Consequently, according to (|5.2p . the price of an Asian call option is 



-rT 



E 



i=l 



(5.3) 



In order to estimate this expectation through a Monte Carlo simulation, we replace each St^ with 
the random variable 



St^ = St,_, e('-'^V2)(t.-t.-i)+V(*'-*-i)C, ^ AA(0, 1), 

so that we can compute the sequence of prices iteratively. Therefore, according to our picture, the 
expectation in (|5.3p can be rewritten as E [/(^)] , where 



/(xi, ...,xn) = max f 0, ^ ^ J]^ ^(r-^V2)(t.-t.-i)+Vfe-*.~i)^.- -K ] andC~AA(0,Id 



In the previous expression, we implicitly assumed to = 0. 

In this particular example, we are going to find the optimal antithetic for an Asian option 
averaged over the asset prices on the first of each month along one year. For the sake of simplicity, 
we suppose that all months have the same number of days. That is, the expiry time T = 1 
equals one year and ti = i/12 years, i = 1, 12. More explicitly, our antithetic matrix will be of 
dimension 12, which means that our optimizing problem (|1.5p takes place on a space of dimension 
dim(so(12)) — 66. We will suppose that the asset price at the initial date is 5*0 = £100 and that 
the strike price equals i^lOO as well. Furthermore, we set r = 2.83% and ly — 10.36%. 

The results obtained are summarized in Table 1. The first row refers to a crude Monte Carlo 
estimation of one Asian option according to the data above-mentioned. The second and the third 
rows contain, respectively, information on the Monte Carlo estimation of (15. 2p using the antithetics 
procedure ()1.4p when the matrix A is minus the identity — Idi2 and the antithetic A* provided by 
algorithm (j4.ip with 7=1/2 and initial condition Aq — — Idi2. For any estimation, we specify 
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the price of the option, the estimated variance cr^ — Var[e~'''^ / (^)] of the method, the error a/y/n 
associated to that estimation, and the time (in seconds) our computer (a laptop with a 1.6 MHz 
CPU) invested in those computations. 





Price (£) 


Variance 


MC Error (cr/V^) 


Time (s) 


Crude Monte Carlo 


3.15 


17.25 


0.01 


8.75 


Antithetic — Idi2 


3.15 


3.50 


0.01 


3.41 


Antithetic A* 


3.15 


3.60 


0.01 


3.54 



Table 1: Monte Carlo simulations for an Asian option 



First of all, we observe that the use of antithetics dramatically reduces the variance a crude 
Monte Carlo simulation by, roughly speaking, 5, which means, once an error threshold is fixed, 
we need less than half of the time required in a crude Monte Carlo to price the option with the 
same accuracy. However, we can see that the use of antithetics in the original Hammersley sense, 
that is, taking — Idi2, slightly beats the antithetic provided by algorithm (|4.ip . This is because, in 
this particular example, — Idi2 is one of the points where the covariance function Cov (/(^), f{A^)) 
reaches its minimum value. If algorithm (|4.1[) does not provide an antithetic so performing is because 
we can only run it for a finite time while its convergence to the set S_ = min^g5(,(i2) Cov {/{£,), f{A^)) 
is assured as t oo. However, despite this fact and that A* is obtained by a numeric method, 
which by definition intrinsically carries some degree of approximation, the result is satisfactory and 
we conclude that A* is very close to an absolute minimum. 

The main drawback of algorithm (14. ip is its speed of convergence. Since at each step we have 
to compute, on the one hand, the exponential of a skew-symmetric matrix (of dimension 12) and, 
on the other, the tangent map Texp"^ to write down the gradient (|4.1ap . the algorithm is very 
time-consuming. For example, the antithetic A* used in the simulations of Table 1 is obtained after 
10000 iterations, which our 1.6MHz computer carried out in approximately forty minutes, nothing 
comparable with the times spent in the Monte Carlo simulations (see Table 1). Therefore, finding 
the optimal antithetic A* before a crude Monte Carlo seems advisable if A* may be reused in several 
simulations. We hope to improve the efhciency of the algorithm in a future work. 



5.2 Covariance swap 

The next example aims at showing that the optimal antithetic A* of some daily traded products 
can be very different from the antithetic — Id which, moreover, turns out to be completely useless. 

A covariance swap depending on two assets Si and 5*2 is a contract that, at the expiry date 
T, pays 




where, as in the case of the Asian option, = to < ti < ... < t^ = T is a sequence of times fixed in 
the contract. This quantity can be negative, which means that the holder of the contract, instead of 
receiving any money, must pay that quantity. It is called covariance swap because, roughly speaking, 
it measures the realised covariance between the returns of the assets 5*1 and 52. Sometimes, the 
returns are alternatively expressed as In {Sti/ Sti_i) . 

In the Balck-Scholes model, where the assets are uncorrelated, the payoff (|5.4p is very small 
(but not zero though, because the spot prices are only observed on a finite number of dates and 
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not continuously). In order to increase the price of (|5.4p so that we avoid working with too many 
decimal places, wc take the maximum between (|5.4p and 0, 



max 




We insist, however, that this is not the way covariance swaps are traded in real markets, where they 
are mainly used as a hedge against possible unwanted correlation effects between assets or indices. 

We are going to fix the details of our simulation. Let i^i = 17.36% and 1^2 = 10.12% be the 
volatilities of and 5*2. Suppose that the asset S2 pays a constant dividend yield d — 2.03% and 
that the constant free-risk interest rate is again r = 2.83%. Hence the assets Si and ^2 follow the 
log-normal processes: 

where i?^ and are two independent Brownian motions. As in the previous example, the price 
of (|5.5p can be written as e~^^ E [/(C)]- Explicitly, 

/(xi, ...,X2n) = max (^0, ^ (^g(r-.?/2)(t,-t,_i)+-i . 

. ^e^''-<'-''2/2)(*3-*3-l)+1^2-y(tj-tj-l)2;2j — ly 

and C ~ A/'(0,Id2Ar) is a Gaussian vector whose odd components j = 1, •.•,iV, are used to 

generate the evolution of Si while the even components C^-' go with 5*2 . It is clear from (|5.6[) that 
the values of Si and 5*2 at time t — play no role in the final price of the covariance swap option, 
so we can take them equal to 1. 

Table 2 summarises the prices of 100 covariance swaps estimated by Monte Carlo. As for the 
Asian option before studied, the asset prices 5*4; are read on the first of every month along one year, 
where all months are supposed to have 30 days, which means that the expiry time T = 1 equals 
one year and ti = i/12 years, i = 1, 12. Our (optimal) antithetic matrix will be of dimension 24, 
so that our optimizing problem (|1.5p takes place on a space of dimension dim(so(24)) = 276. As 
in Table 1, the first row refers to a crude Monte Carlo, the second one to the antithetics procedure 
(II. 4p when the matrix A is minus the identity — Id24, and the third one to the antithetic A* provided 
by algorithm (|4.ip with 7 = 1/2 and initial condition Aq = — Id24. Again, we specify the price of 
the option, the estimated variance a = Var[e-'''^ / (C)] of the method, the error cr j \fn associated 
to that estimation, and the time (in seconds) spent in those simulations. 





Price (£) 


Variance 


MC Error (cr/\Ai) 


Time (s) 


Crude Monte Carlo 


0.198 


0.081 


0.001 


7.28 


Antithetic — Id24 


0.197 


0.081 


0.001 


15.72 


Antithetic A* 


0.198 


0.029 


0.001 


5.02 



Table 2: Monte Carlo simulations for an option on a covariance swap 



We can observe that now the antithetic A = — Id24 does not improve the efficiency of the crude 
Monte Carlo. Indeed, it is easy to check that, in this particular example, Cov(/(C), /(— 0) = 
Var[/(C)], C ^ A/'(0,Id24), so that the variance of (|1.4p is the same of the original (crude) Monte 



(5.6) 
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Carlo. However, since at each step we now evaluate the payoff function / twice, we need twice as 
much time. In conclusion, the antithetic A = — 16.24 is useless. 

On the contrary, the antithetic A* reduces the variance 2.8 times which, in turn, means that 
the computational time is reduced, approximately, by 1.45. Unfortunately, unlike — Id24, A* is an 
orthogonal dense matrix that has no trivial interpretation. And what is worse, as the dimensionality 
of the optimisation problem grows, so does the time needed to obtain a good approximation of A* . 
For example, in the location of the skew-symmetric matrix Y* g so (24) such that exp (Y*) — A* , 
a 1.6MHz laptop invested more than an hour in carrying out 10000 iterations of (|4.ip . Although 
there are certainly more performing computers, this time is huge compared with the times of the 
simulations we want to improve (see Table 2). 



6 A dynamically optimized Monte Carlo 

Section |4] provides an algorithm to locate the optimal antithetic A* . Recall that the use of this 
matrix A* will improve the calculation of m = E[/(^)] as a Monte Carlo average in the form 

n-^oo TL ^ — ^ 2 
k=l 

In practice, as we saw in Section^ it seems too expensive to locate A* and then use it in the Monte 
Carlo calculation. It would be better to use the antithetics dynamically, meaning that we use as 
approximating sum 

k=\ 

In other words, we calculate the optimal antithetic at the same time as we estimate E[/(^)]. 

It is worth emphasizing that the random samples {CnjneN used in the location of A* via the 
simulated annealing (j4.ip can be also recycled in (j6.ip to approximate to. Did we do that, the 
results in this section would continue being valid. However, the expressions would become more 
complex but with no additional content. Therefore, in order to illustrate that antithetic variates 
can be used dynamically, we are only going to study the estimations of m obtained from (|6.ip . 
Furthermore, for the benefit of a simpler notation, we will abbreviate 

5(A,_i,6):=^(/(a-) + /(^fe-iCfc))- 

The random variables {g (j4fe_i , Cfe)}fcgN ^'"^ neither independent nor equally distributed because, 
for any €: N, A^ is not deterministic but depends on the previous random variables {^i, Ci}i=i k- 
Consequently, we cannot invoke the Strong Law of Large Numbers to guarantee that (|6.ip converges 
to TO almost surely. In other words, (|6.ip is different from computing m as 

2 

fe=i 

for the optimal value A*, where A* is supposed to be deterministic and fixed. Moreover, the 
dependence of {g (Afc_i, might cause the appearance of some positive correlations which 

might spoil the efficiency of our method. Nevertheless, our situation is not that bad. Indeed, if we 
define 

1 " 

Sn ■■= - 'Vg(Afe_i,^fc), 

n ^-^ 

k=l 
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we are going to prove in Theorem [TT] that, on the one hand, Sn converges to the expectation 
m — E[/ (^)] almost surely as n —^ oo and, on the other, that \/n{Sn — m) converges in law to a 
normal variable Af (O, cr^) with variance 

= Var [g (^*, 0] = ^ (Var[/(0] + Gov (/ , / (A*^))) , 

^ ~ A/'(0,IdAr). It is worth noticing that cr^ = Var [g (A*, ^)] is the smallest possible variance we 
can get using antithetic variates. 

Finally, in order to estimate crj and thus obtain empirical confidence intervals for m, we prove 
in Theorem [T2I that 

1 " 

- V'g^ {Ak-i,£,k) - — > cr^ in probability. 

n ^ — ^ n^oo 
fc=l 

The following results are inspired by [A041 IA03j . 

Theorem 11 Let A* e 0{N) and suppose that there exists a sequence {^fcjfcgN of random variables 
taking values in 0{N) such that Ak A* in probability as fc — > 00. Let {£,k}i;^ff and {Cfc}/cgN ^ 
couple of sequences of i.i.d Gaussian vectors mutually independent. Let J-^ = cr \ 1 < i < k) 

be the a-algebra generated by {^i, Ci}i=i,...,k and suppose that Ak is Tk- measurable. Furthermore, if 
^ ^ A/'(0,Idjv) denote an independent Gaussian vector, assume that E[|f; {Ak-i,C) P] < °o fof any 
fc G N and that the map A i-^- E[|5 ] is continuous at A* . Then, 



m. 



1 " 

-Y.aiAk-uik) ^ 

12 ^ — ^ n — *oc 

fc=l 

Proof. Define Mq = and 

n 

M„ = ^(5(Afe_i,Cfe)-m), n>l. 



k=l 



Let Tn be the a-algebra generated by {Ci,Ci}i=i n- Since A„_i is JF„_i-measurable and ^„ is 
independent of J-n-i, we have 

E[.g(A„_i,C„) I J'n-i] = E[g{A,Cn)]\A=A„_, ^ rn 

and {A^n}„gN is a martingale with respect to the filtration {^n}neN- imposed that E[(g (A„_i, ^„))^] < 

00 for any n G N, it is not difficult to see that {Mn}^^jq is a square integrablc martingale. On the 
other hand, its angle bracket is given by 

n n 

(M)„ = ^E[(AM,f \:Fk-i] =^(E[5^(A,_i,a-) I ^.-1] 

n 



k=l k=l 
k=l 

Now, since A^ — » A* in probability as fc — ^ 00 and A 1-^ Fj[g'^ (A, £^k)] is continuous at A*, 
E[g'^{A,^k)]\^_j^^ ^ also converges in probability to E[g^(yl, ^fc)] |^_^. by Lemma p^ . Apply- 



ing Lemma we see that 



E[5' (A,0]L^^.-m2 = Var[g(A*,e)] 



in probability. Therefore, by Theorem [15] (i), we conclude that -Mn — > a.s 
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Theorem 12 (Central Limit Theorem) With the same notation as in Theorem\T^ assume now 
that E[|5 {Ak-i,^) I"*] < oo for any fc £ N and that A ^ E[|g is continuous at A* , 1 < p < A. 

(i) The following convergence in law holds: 

n — *oo 

where a1 := Var [g {A* , ^)] . 

(ii) Let a\ \ YTk=\ 9^ (^fc-i: Cfc) - S^. Then '"-^ crj in probability. 
Proof. 

(i) First of all, observe that 

E[\g{Ak-u^k)-m\^ \ J'k-i] - E (A^^i, 6) I ^fe-i] - Sm^ 

- 4toE [g3 {Ak-i,^k) I ^fc-i] + 6m^E[g^ (Afc-i,6) I -^fc-i] 
= E[/(A6)]L=^_-3m4 

- 4m E [g3 (A, a)] \^^^^_^ + 6m' E [g^ (A, Cfc)] | 



Since A i— > E[|(7 ^) |p] is continuous and Ak — > A in probability as fc — > oo, by Lemma 

(fT4|) we have that E [\g {Ak^i,S^k) ~ rn\^ \ JFfc-i] converges in probability to a positive random 
variable L S (fi, P) as fc — + oo. Thus, by Lemma (fT3|) . 



1 " 

- VE[|5(A,_i,4)-mn 



fc=i 



in probability. 

For any a > 0, define 



1 " 



n 



It can be easily checked that 

FAa) < VE[|g(A-i,a)-"^l' I -^^/.-i] 



fc=i 



so that, taking the limit superior of the sequence {F„ ('3')}„gN probability, we obtain that 
limsup^^o^ Fn (a) < a~'L. If now a — e^fn with e > fixed, we have 

limsupF„ {e\fn) — 

n — *oo 

in probability and the Lindberg's condition holds. Therefore, yjn {Sn — m) ^Izl^' j\f (^g, crj) in 
law by Theorem [15] (ii) . 
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(ii) Essentially, we only need to show that i Y.k=i ^ [5^ 0] Ia^a^-i « T.k=i 5^ (^fe-i, Cfe) 
have the same limit a.s.. We are going to do so by mimicking the proof of Theorem [TT] and 
quoting Theorem 1151 

Define Mq = and 



Mr, 



n 



fc=l 



n > 1. 



It is not difficult to check that {M„}^gp^ is a square integrable martingale with respect to 
{•^"InGN' -^n = f Ci M = •■•I whosc angle bracket is given by 



fe=l 



By the continuity of A i-^ E [|g (A, ^)|^], 1 < p < 4, and Lemma [T^ we have 

in probability. Then Theorem [15] (i) claims that 

-5]E[g2(A6)]L_^, Oa.s.. 



fc=l 



fc=l 



Since E [g^ (A,^^)] |^^^^ ^ = E [g^ (^jC)] lA=Afc i ^"-"^ arbitrary Gaussian vector ^, we 

conclude that i X]fe=i 5^ (^fc-i, Cfe) and i I]fc=i E [5^ (A, 0] L=Afc_i '^'^'^'^ '^^^'^ 
most surely. 

Finally, as ^ X]fe=i E [g^ (^,0] L=Afc 1 ^^"^ converge in probability to E [g^ (A*,0] and 
E [g [A* , as n — > 00 respectively, we obtain that 

1 " 

'^n = - E 5' ' Cfe) - ^ E [52 (A* , ^)] _ E [5 [A* ,0f^ Var [g [A* , 0] 



in probability. 



A Appendix 

In this appendix we recall some auxiliary results. The first one is a probabilistic version of the 
so-called Cesaro Lemma and reads 

Lemma 13 Let {CnJngN ^'wr (^i-f) ^ sequence of random variables variables and let ^ S 
P). If £,n in probability as n ~* 00, then 

1 " 

— / Cfc *■ ? *^ probability as well. 
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e{n - no) 

< e. 



Proof. Let C, ?7 G L°.^ i^, P)- The function d (C, ??) = E [min (1, |1C - ?7||)] is a distance in L°.^ {Q, P) 
compatible with convergence in probabihty. That is, a sequence {f7n}„gN C L'^r P) converges 
in probabihty to 77 G L^r (^1, P) if and only if d {rin, 77) ^ as n — > 00. 

Let e > and let riQ G N be large enough so that d{£^n,£,) < £/2 for any n > uq. We can choose 
ni > no so that 

-i:<'(6.«)<l- 

Then, for any n > ni we have 

/ 1 X — ^" \ 1 X — ^" 1 X — ^"0 1 v — ^nl 

Consequently, i Cfe converges to ^ in probability. ■ 

The next lemma is rather elementary and well known. However, we state and prove it for the 
benefit of a clearer exposition. 

Lemma 14 Let X be a topological space. Let {xn}nefi ^'xi^^P) sequence of random 

variables variables and x £ X. Let f : X W be a function continuous at x. If 
probability as n oo, then f{xn) — s- / (x) in probability as well. 

Proof. Let £ > 0. By the continuity of /, there exists an open neighborhood Vx around x such 
that 1/ (y) - f{x)\ < e for any y e V^. Then, 

{LOen\ \f (Xn) - f{x)\ >e}C{Luen\ .T„ ^ V^} . 

Since P ({w G ri | a;„ ^ T^}) — ^ as n oo because {a;„}^gpj converges in probability to a; G M, we 
see that 

P{{Luen\ \fixn)~fix)\>e}) 

n — >oo 

as well, and {/(a;n)}„gN converges in probability to / (a;). ■ 

Unlike the previous results, the following theorem is much deeper. Roughly speaking, it is a 
powerful tool to prove general versions of the Law of Large Numbers and the Central Limit Theorem 
for sequences of random variables which are neither independent nor equally distributed. Its proof 
can be found in [D97l Corollary 2.1.10]. 

Theorem 15 Let {M„},jgpj be a real, square-integrable martingale adapted to a filtration {^n}neN- 
Let (M)^ denote its angle bracket process. Let {a„}^gpj be an increasing sequence such that a„ — > cx) 
as n ^ oo. 

(i) // — > > in probability, then ^ > a.s.. 

(ii) If Lindbergs's condition holds, that is, 



k=l 



Mk - Mk-i\^ l{|Mfc-Affc_i|>eV^} I ^k-i * in probability 



then — > TV (O, a'^) in law. 
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